# EEGUnity Kernels: Dataset-Specific In-Memory Preprocessing ## 1. Introduction EEGUnity provides a unified interface for parsing, preprocessing, and managing EEG datasets. However, many public datasets are not fully standardized: - Event markers may be stored in separate `.mat`, `.tsv`, or `.csv` files. - Subject metadata may exist in independent tables. - Channel naming conventions may vary across releases. - Folder structures may differ between mirrors or versions. To address this variability **without duplicating EEG data**, EEGUnity introduces the concept of external kernels. A kernel is a dataset-specific, in-memory preprocessing plugin that runs automatically when data is read. ------------------------------------------------------------------------ ## 2. Why Use Kernels? Traditional workflows often: 1. Load raw data 2. Run dataset-specific preprocessing scripts 3. Export a new standardized dataset copy This approach duplicates EEG arrays and complicates maintenance. Kernels solve this by: - Running at read time - Updating `mne.io.Raw` objects in memory - Attaching metadata and annotations dynamically - Leaving the original dataset untouched ------------------------------------------------------------------------ ## 3. How Kernels Work When binding a kernel: ``` python from eegunity import UnifiedDataset ud = UnifiedDataset( dataset_path="/data/openneuro/ds005505", domain_tag="openneuro_ds005505", kernel_spec="/abs/path/openneuro_ds005505_kernel" ) raw = ud.eeg_parser.get_data(0) ``` Internally: 1. EEGUnity loads the Raw object. 2. The external kernel is loaded dynamically. 3. The system calls: ``` python kernel.apply(udataset, raw, row) ``` If the kernel fails, EEGUnity emits a warning and returns the unmodified raw. ------------------------------------------------------------------------ ## 4. Kernel File Requirements Each kernel file must: 1. Be a single Python module 2. Define exactly one object named: ``` python KERNEL = YourKernelClass() ``` 3. Implement: ``` python apply(udataset, raw, row) -> raw ``` One file equals one kernel. No suffix such as `:KERNEL` is required. Valid kernel specifications: - File path (extension optional): "/abs/path/figshare_largemi_kernel" - Module import path: "my_private_kernels.figshare_largemi_kernel" ------------------------------------------------------------------------ ## 5. Recommended Naming Convention Kernel names should reflect the dataset source: - figshare_xxxx - openneuro_ds005505 - kaggle_xxxx - bcic_iv_2a Inside the kernel class: ``` python KERNEL_ID = "figshare_largemi" ``` ------------------------------------------------------------------------ ## 6. Kernel Interface Specification Required structure: ``` python class SomeKernel: def apply(self, udataset, raw, row): ... return raw KERNEL = SomeKernel() ``` Parameters: - udataset: dataset-level context - raw: loaded MNE Raw object - row: locator row (contains "File Path") Return the modified raw object. ------------------------------------------------------------------------ ## 7. Determining Dataset Root If instantiated with dataset_path, use it directly. If instantiated with locator_path only: 1. Use udataset.get_shared_attr()\["dataset_path"\] if available. 2. Otherwise compute common minimal prefix of all File Path entries. 3. Fallback to directory of row\["File Path"\]. ------------------------------------------------------------------------ ## 8. Writing Robust Kernels To support dataset variants: - Avoid hardcoded paths - Search recursively for participants or event files - Tolerate alternate column names - Handle missing metadata gracefully - Avoid assuming fixed folder structures Focus on robust logic. EEGUnity handles exception safety. ------------------------------------------------------------------------ ## 9. Minimal Kernel Template ``` python from __future__ import annotations import json from dataclasses import dataclass import mne @dataclass class ExampleKernel: KERNEL_ID: str = "source_name" def apply(self, udataset, raw: mne.io.BaseRaw, row) -> mne.io.BaseRaw: description_dict = { "original_description": raw.info.get("description", ""), "eegunity_description": { "source_name": self.KERNEL_ID }, } raw.info["description"] = json.dumps(description_dict) return raw KERNEL = ExampleKernel() ``` ------------------------------------------------------------------------ ## 10. Summary Kernels allow EEGUnity to remain lightweight, avoid licensing issues, and support diverse datasets through dynamic, in-memory preprocessing.